AITopics | theoretical result

Collaborating Authors

theoretical result

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Capturing Individual Human Preferences with Reward Features

Neural Information Processing SystemsJun-18-2026, 14:23:14 GMT

Reinforcement learning from human feedback usually models preferences using a reward function that does not distinguish between people. We argue that this is unlikely to be a good design choice in contexts with high potential for disagreement, like in the training of large language models. We formalise and analyse the problem of learning a reward model that can be specialised to a user. Using the principle of empirical risk minimisation, we derive a probably approximately correct (PAC) bound showing the dependency of the approximation error on the number of training examples, as usual, and also on the number of human raters who provided feedback on them. Based on our theoretical findings, we discuss how to best collect pairwise preference data and argue that adaptive reward models should be beneficial when there is considerable disagreement among users.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country: Europe (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Adaptive Complexity of Minimizing Relative Fisher Information

Neural Information Processing SystemsJun-18-2026, 07:56:23 GMT

Non-log-concave sampling from an unnormalized density is fundamental in machine learning and statistics. As datasets grow larger, computational efficiency becomes increasingly important, particularly in reducing adaptive complexity, namely the number of sequential rounds required for sampling algorithms. In this work, we initiate the study of the adaptive complexity of non-log-concave sampling within the framework of relative Fisher information introduced by Balasubramanian et al. in 2022. To obtain a relative Fisher information of at most ε2 from the target distribution, we propose a novel algorithm that reduces the adaptive complexity from O(d2/ε4) to O(d/ε2) by leveraging parallelism. Furthermore, we show our algorithm is optimal for a specific regime of large ε. Our algorithm builds on a diagonally parallelized Picard iteration, while the lower bound is based on a reduction from the problem of finding stationary points.

artificial intelligence, complexity, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On the Robustness of Transformers against Context Hijacking for Linear Classification

Neural Information Processing SystemsJun-17-2026, 21:27:39 GMT

Transformer-based Large Language Models (LLMs) have demonstrated powerful in-context learning capabilities. However, their predictions can be disrupted by factually correct context, a phenomenon known as context hijacking, revealing a significant robustness issue. To understand this phenomenon theoretically, we explore an in-context linear classification problem based on recent advances in linear transformers. In our setup, context tokens are designed as factually correct query-answer pairs, where the queries are similar to the final query but have opposite labels. Then, we develop a general theoretical analysis on the robustness of the linear transformers, which is formulated as a function of the model depth, training context lengths, and number of hijacking context tokens. A key finding is that a well-trained deeper transformer can achieve higher robustness, which aligns with empirical observations. We show that this improvement arises because deeper layers enable more fine-grained optimization steps, effectively mitigating interference from context hijacking. This is also well supported by our numerical and real-world experiments. Our findings provide theoretical insights into the benefits of deeper architectures and contribute to enhancing the understanding of transformer architectures.

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law Enforcement & Public Safety > Terrorism (1.00)
Education (0.69)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

7f6caf1f0ba788cd7953d817724c2b6e-Supplemental.pdf

Neural Information Processing SystemsApr-26-2026, 14:41:19 GMT

artificial intelligence, dataset, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

ADerivation of D1 Denote the logit vector as x, we have pj = exj

Neural Information Processing SystemsApr-26-2026, 13:17:00 GMT

Without zero-mean constraint, the training becomes unstable. Following the training setting of [23], the classifier network is trained with SGD with a weight decay 5e-4, an initial learning rate of 1e-1 and a mini-batch size of 100 for all methods. We use the cosine learning rate decay schedule [49] for a total of 80 epochs. We set the outer level learning ηω as 14 Figure 7: Training curve without zero-mean constraint on CIFAR10 under 40% uniform noise. The MLP weighting network is trained with Adam [51] with a fixed learning rate 1e-3 and a weight decay 1e-4.

artificial intelligence, experiment, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

4e0d67e54ad6626e957d15b08ae128a6-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 20:00:46 GMT

artificial intelligence, machine learning, theoretical result, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.56)

Add feedback

Approximations for the computation of m

Neural Information Processing SystemsApr-25-2026, 19:40:39 GMT

Providing a very low critical probability pc means that certification occurs when the simulation ends after a large number of iterations m. We introduce `c the threshold associated to pc s.t. Table 5 shows that this approximation is excellent even for large pc. This shows that mis a little larger than mc = log(pc)/log(1 1/N). This section assumes that X = xo + σ X with X N(0n; In) and that h(x) = x>g τ with g Rn and kgk= 1 (w.l.o.g.).

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

477bdb55b231264bb53a7942fd84254d-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 16:55:34 GMT

artificial intelligence, machine learning, ridge estimate, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Supplement to " Uniform Concentration Bounds toward a Unified Framework for Robust Clustering "

Neural Information Processing SystemsApr-25-2026, 16:23:28 GMT

For the theoretical exposition, we first establish the following Lemmas. Lemma A.1 proves that the derivative of the function φis bounded in the `2-norm when the domain is restricted to the support of P. Lemma A.1. Lemma A.3 proves that the function fΘ, as a function of Θ, is Lipschitz with respect to the k k norm. Joint first authors contributed equally Corresponding author 35th Conference on Neural Information Processing Systems (NeurIPS 2021). Thus, from equation (1), h φ(PC(θ)) φ(θ),x PC(θ)i 0. (2) We now observe that, dφ(x,θ) dφ(x,PC(θ)) dφ(PC(θ),θ) = h φ(PC(θ)) φ(θ),x PC(θ)i 0. Hence the result.

artificial intelligence, machine learning, momnl, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

lower bound

Neural Information Processing SystemsApr-25-2026, 12:42:56 GMT

While there remains a small gap between our main lower bound of Theorem 3 and the deterministic quantised gradient descent of Section 6, we can show that the gap cannot be closed by improved deterministic algorithms where the coordinator learns value of objective function F(x) in addition to the minimiser x. That is, our quantised gradient descent is the communication-optimal deterministic algorithm for variant (1) for objectives with constant condition number. Recall that in the N-player equality over universe of size d, denoted by EQd,N, each player i is given an input bi 2{ 0,1}d, and the task is to decide if all players have the same input. It is known [33] that the deterministic communication complexity of EQd,N is CC(EQd,N)= ( Nd). Theorem 8. Given parameters N, d, ", 0 and = 0N satisfying d /" = (1), any deterministic protocol solving (1) for quadratic input functions x 7! 0kx x0k22 has communication complexity Nd log( d/"), if the coordinator is also required to output estimate r 2 R for the minimum function value such that Assume is a deterministic protocol solving (1) with communication complexity C .We show that can then solve N-party equality over a universe of size D = ( dlog( d/")), implying C = ( ND)= Nd log( d/") . More specifically, let S be the set given by Lemma 2 with =(2 "/)1/2, and let D = dlog|S|e = (dlog( d/")). Note that since we assume d /" = (1), the set S has at least two elements and D 1.

artificial intelligence, gradient descent, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.57)

Add feedback